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Project  Context 

•  RADAR:  Goal  is  to  help  desktop  user 

-  Personal  Assistant  that  Learns  (PAL) 

-  Test  environment:  conference  planning 

-  Primary  input:  email  messages! 

•  Requirements  include: 

-  Preprocessing  of  email  messages 

•  Segmentation,  typo  correction,  etc. 

•  Syntactic  parsing 

•  General  and  domain-specific  semantic 
interpretation 

•  Domain-specific  task  request  extraction 

-  Original  content  preserved 
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The 

RADAR 

CPE 


o  a  new  email  message  is 
stored  in  the  annotations 
database 


© 


The  results  are  stored  in  the 
Annotations  Database  for  use 
by  other  RADAR  agents 


email 

messages 


RADAR  Collection  Processing  Engine  (CPE) 


message 
annotations  (tags) 


Collection 

Reader 


Annotator 


Annotator 


A  UIMA  Collection  Processing  Engine  is  invoked.  Stand-off 
annotations  (tags)  are  created  to  capture  the  system’s 
understanding  of  the  email. 
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Radar  Annotator 
implementation 


•  Three  varieties  of  implementation 

-MinorThird  toolkit  (William  Cohen) 
-Java  code 

-Client/server  (primarily  for  legacy 
and  external  vendor  software)  with 
UIMA  wrapper  for  client 
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Radar  Annotators:  list 


•  In  order: 

•  Collection  Reader 
Email  Opening 
Conexor  parser 
Temporal  Expr. 
F-Structure 
GFrame 
DFrame 

•  Last: 

•  CAS  Consumer 


•  In  between: 

•  Task 

RADAR  Person 
SCONE  Semantics 
Person  Name 
SCONE  Implicit 
Space  Request 
Typo 
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Example  Document 


Blake, 

They  are  really  doing  it.  I  won't  even  try  to  justify  it.  But  it  is  not  quite  as  bad  as  it 
could  have  been  --  the  technology  folks  claim  all  they  need  is  Wednesday  and 
Thursday.  Please  begin  right  awayto  move  all  of  the  sessions  that  need  moving. 
And  thank  you  very  much  for  agreeing  to  fill  in  for  Blake  during  this  critical  juncture. 
The  loss  of  the  UC  means  we  have  a  lot  of  work  to  do  and  your  assistance  is  greatly 
appreciated. 

Our  primary  mission  is  to  find  replacement  rooms  for  all  of  the  \ Wednesday  and. 
Thursday  sessions  and  events  currently  scheduled  in  the  University  Center. 

We've  been  told  there  maybe  suitable  rooms  in  Stever  Hall.  We  have  arranged 
access  to  the  Universitys  conference  planning  web  portal  so  you  can  make  the 
necessary  vendor  changes.  I  have  also  arranged  to  have  Blake's  original  conference 
schedule  to  be  provided  in  the  native  Space  Time  Planner  format.  It  should  be  on 
your  computer  already. 

I  have  been  informed  that  the  materials  from  your  crash  course  in  conference 
planning  are  also  on  your  computer.  We  have  alloted  $12,000  to  make  the 
necessary  changes.  If  you  can  do  the  job  in  less,  it  w^tild  be  greatly  appreciated. 


Thanks  again  and  sorry  about  the  terrible  news. 
Jonathan  Robertson,  Program  Committee  Chair 


Typo  Annotation 


RADAR 

Person 


Legend 


□  ConexorP...  □  ConexorS...  □  ConexorT...  □  Documen...  □  FStructure 
[✓j  Minorthir.. 


Select  All 

Deselect  All 

Hide  Unselected 

13  Annotations 
9  l~1  TvpoAnnotation 

9  C3  T\4ooAnnotation  ("alloted") 

D  begin  =  1069 
Q  end  =  1076 
D  Typo  =  alloted 

[^Suggestions  =  [balloted,  allotted,  alloyed,  allowed] 


Time 

Expression 


Click  In  Text  to  See  Annotation  Detail 
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Sample  Annotations:  TempEx 


String 

Offset 

Length 

of  the  summer 

73 

13 

This  summer 

175 

11 

three  days 

359 

10 

1  week 

493 

6 

Starting  May  10) 

774 

16 

July  4 

1971 

6 

INDEPENDENCE  DAY 

1939 

16 
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Sample  Annotations:  Typo 


String 

Offset 

Length 

Value 

teh 

60 

3 

the,  eh,... 

brousing 

20 

8 

rousing, 

browsing 

bris 

16 

4 

...brisk... 

midle 

36 

5 

middle,... 

infor 

117 

5 

inform,... 

committe 

83 

8 

committed 

fed 

286 

3 

fled 

Carnegie  Mellon 

Language  Technologies  Institute 


NL  Message  Preprocessing  with  UIMA 

Copyright  ©  2008,  Carnegie  Mellon.  All  Rights  Reserved. 


Sample  Annotations:  DFrame 


String 

Offset 

Len 

Value 

Which  room  is 
the  first  event 
in? 

14 

33 

((dframe  ... 
(subj  (  (POS 

N)  (attr  ( 

(POS  NUM) 
(function  attr) 
(ortho  first) 
(root  first)  ... 
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Sample  Annotations:  BriefingReq 


String 

Offset 

Len 

Value 

I  need  a 
progress 
report  on 
yesterday 

NOW 

0 

43 

<node 

id="requestll 
72260778347" 
...  </node> 

please  send 
me  a  campus 
map  sooon.  — 
chian 

0 

44 

<node 

id="requestll 
72261238858" 
...  </node> 
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% 

Time(ms) 

s/doc 

Annotator 

65.27 

5310311 

21.24 

DFrame 

24.60 

2001145 

8.00 

GFrame 

Fxnensi 

ve  rule-based 

eomnutat 

ions 

LADAR  Person 

(structural  transformation  rules 
and  KB  lookups) 

SCONE  Sem. 

"emporal  Expr. 

1.03 

83563 

0.33 

Person  Name 

0.71 

57742 

0.23 

SCONE  Impl. 

0.54 

44187 

0.18 

F-Structure 

0.18 

14889 

0.06 

Email  Opening 

0.17 

13513 

0.05 

SpaceRequest 

0.17 

13445 

0.05 

Conexor 

0.07 

5835 

0.02 

Typo 

0.06 

4746 

0.02 

CAS  Consumer 

0.03 

2725 

0.01 

Collection  Reader 

0.03 

2415 

0.01 

Task 

100.00 

8136349 

32.55 

Entire  Pipeline 

[  sample:  250  randomly  selected  messages] 
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7.  Add  domain  semantics 
6.  Add  general  semantics 

Label  known  person  names 
Domain  KB  interpretation 

4.  Add  anchored  time  labels 

Label  any  person  name 
Add  domain  KB  features 

5.  Label  grammatical  roles 

2.  Label  salutations  in  email 

Label  space  requests 

3.  Segmentation,  parsing 

Label  typo  fixes 

8.  Write  to  ADB 

1 .  Read  from  ADB 
Label  task  requests 

Annotator 
Run-Time 


Sample  Annotator  Precision 


Annotator 

%  Correct 

%  Partly 
Correct 

Vendor  Order  Annotator 

100% 

— 

Task  Annotator 

73% 

77% 

Person  Name  Annotator 

76% 

85% 

Space  Request  Annotator 

64% 

79% 

[  sample:  50  randomly  selected  messages] 


Since  the  RADAR  context  is  machine  assistance  in  a  human 
task,  these  should  also  be  correlated  with  their  effect  on  human  task 
performance  (currently  assessed  end-to-end  for  full  system). 
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Cost  of  Adoption 


•  1.5  months  FTE  to  wrap  and 
integrate  15  NLP  components 

(programmer  already  familiar  with  UIMA) 
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Issues/Future  Work 

•  Better  robustness/decoupling 

-  Require  standard  service  interfaces  for  NLP 
components 

-  Wrap  as  UIMA-EE  (UIMA-AS)  services 

•  Better  transparency 

-  Hard  to  tell  whether  a  service  is  dead  or 
just  working  hard 

-  Need  better  logging/communication  with 
services 

•  Better  speed 

-  Optimize  rule-based  engines 

-  Provide  multiple  service  instances  for  time- 
consuming  services 
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Questions? 
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